Message ID | 20240408172605.635508-5-leitao@debian.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net : dqs: optimize if stall threshold is not set | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Mon, 8 Apr 2024 10:25:56 -0700 Breno Leitao wrote: > With the previous change, struct dqs->stall_thrs will be in the hot path > (at queue side), even if DQS is disabled. > > The other fields accessed in this function (last_obj_cnt and num_queued) > are in the first cache line, let's move this field (stall_thrs) to the > very first cache line, since there is a hole there. > > This does not change the structure size, since it moves an short (2 > bytes) to 4-bytes whole in the first cache line. Doesn't this move the cache line bouncing problem to the other side? Eric said "copy" I read that as "have two fields with the same value". I think it's single digit number of alu instructions we'd be saving here, not super convinced patch 3 is the right trade off...
On Tue, Apr 09, 2024 at 05:21:49PM -0700, Jakub Kicinski wrote: > On Mon, 8 Apr 2024 10:25:56 -0700 Breno Leitao wrote: > > With the previous change, struct dqs->stall_thrs will be in the hot path > > (at queue side), even if DQS is disabled. > > > > The other fields accessed in this function (last_obj_cnt and num_queued) > > are in the first cache line, let's move this field (stall_thrs) to the > > very first cache line, since there is a hole there. > > > > This does not change the structure size, since it moves an short (2 > > bytes) to 4-bytes whole in the first cache line. > > Doesn't this move the cache line bouncing problem to the other side? I think so. Looking at dql_check_stall(), it only uses fields in the second cache line, except now 'dql->stall_thrs' that is in the first cache line. > Eric said "copy" I read that as "have two fields with the same value". Sorry, I misunderstood it. I can create two fields, and update them together at the only place where they will be updated (bql_set_stall_thrs). > I think it's single digit number of alu instructions we'd be saving > here, not super convinced patch 3 is the right trade off... Right. I was more concerned about the write barriers (smp_wmb()) inside the loop which happen quite frequently. But, if you think that the this is not the right approach, I can drop this whole patchset. Do you think a profiler will us here? Thanks!
On Wed, 10 Apr 2024 06:52:56 -0700 Breno Leitao wrote: > > Doesn't this move the cache line bouncing problem to the other side? > > I think so. Looking at dql_check_stall(), it only uses fields in the > second cache line, except now 'dql->stall_thrs' that is in the first > cache line. We do read num_queued at the beginning of dql_completed(). So maybe we we move the read of the threshold there we will be fine.
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h index 869afb800ea1..281298e77a15 100644 --- a/include/linux/dynamic_queue_limits.h +++ b/include/linux/dynamic_queue_limits.h @@ -50,6 +50,9 @@ struct dql { unsigned int adj_limit; /* limit + num_completed */ unsigned int last_obj_cnt; /* Count at last queuing */ + /* Stall threshold (in jiffies), defined by user */ + unsigned short stall_thrs; + unsigned long history_head; /* top 58 bits of jiffies */ /* stall entries, a bit per entry */ unsigned long history[DQL_HIST_LEN]; @@ -71,8 +74,6 @@ struct dql { unsigned int min_limit; /* Minimum limit */ unsigned int slack_hold_time; /* Time to measure slack */ - /* Stall threshold (in jiffies), defined by user */ - unsigned short stall_thrs; /* Longest stall detected, reported to user */ unsigned short stall_max; unsigned long last_reap; /* Last reap (in jiffies) */
With the previous change, struct dqs->stall_thrs will be in the hot path (at queue side), even if DQS is disabled. The other fields accessed in this function (last_obj_cnt and num_queued) are in the first cache line, let's move this field (stall_thrs) to the very first cache line, since there is a hole there. This does not change the structure size, since it moves an short (2 bytes) to 4-bytes whole in the first cache line. This is the new structure format now: struct dql { unsigned int num_queued; /* Total ever queued */ unsigned int last_obj_cnt; /* Count at last queuing */ ... short unsigned int stall_thrs; /* 12 2 */ /* XXX 2 bytes hole, try to pack */ ... /* --- cacheline 1 boundary (64 bytes) --- */ ... /* Longest stall detected, reported to user */ short unsigned int stall_max; /* 100 2 */ /* XXX 2 bytes hole, try to pack */ }; Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Breno Leitao <leitao@debian.org> --- include/linux/dynamic_queue_limits.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)