Message ID | 20230520052957.798486-2-leobras@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Move usages of struct __call_single_data to call_single_data_t | expand |
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 06caacd77ed6..44201e18681f 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -105,6 +105,11 @@ struct request { }; struct block_device *part; + + union { + struct __call_single_data csd; + u64 fifo_time; + }; #ifdef CONFIG_BLK_RQ_ALLOC_TIME /* Time that the first bio started allocating this request. */ u64 alloc_time_ns; @@ -189,11 +194,6 @@ struct request { } flush; }; - union { - struct __call_single_data csd; - u64 fifo_time; - }; - /* * completion callback. */
Currently, request->csd has type struct __call_single_data. call_single_data_t is defined in include/linux/smp.h : /* Use __aligned() to avoid to use 2 cache lines for 1 csd */ typedef struct __call_single_data call_single_data_t __aligned(sizeof(struct __call_single_data)); As the comment above the typedef suggests, having struct __call_single_data split between 2 cachelines causes the need to fetch / invalidate / bounce 2 cachelines instead of 1 when the cpu receiving the request gets to run the requested function. This is usually bad for performance, due to one extra memory access and 1 extra cacheline usage. As an example with a 64-bit machine with CONFIG_BLK_RQ_ALLOC_TIME=y CONFIG_BLK_WBT=y CONFIG_BLK_DEV_INTEGRITY=y CONFIG_BLK_INLINE_ENCRYPTION=y Will output pahole with: struct request { [...] union { struct __call_single_data csd; /* 240 32 */ u64 fifo_time; /* 240 8 */ }; /* 240 32 */ [...] } At this config, and any cacheline size between 32 and 256, will cause csd to be split between 2 cachelines: csd->node (16 bytes) in the first cacheline, and csd->func (8 bytes) & csd->info (8 bytes) in the second. During blk_mq_complete_send_ipi(), csd->func and csd->info are getting changed, and when it calls __smp_call_single_queue() csd->node will get changed. On the cpu which got the request, csd->func and csd->info get read by __flush_smp_call_function_queue() and csd->node gets changed by csd_unlock(), meaning the two cachelines containing csd will get accessed. To avoid this, it would be necessary to make sure request->csd is placed somewhere else in the struct, so it is always in a single cacheline, while avoiding the introduction of any hole in the struct. In order to achieve this, move request->csd to after 'struct block_device *part'. The rationale of this change is that: - There will be no CONFIG_*-dependent field before csd, so there is no chance of having unexpected holes on given configs. - On 64-bit machines, csd will be at byte 96, and - On 32-bit machines, csd will be at byte 64. This means after this change, request->csd will always be cacheline aligned for cachelines >= 32-bytes (64-bit) and cachelines >= 16-bytes (32-bits), as long as struct request is cacheline aligned. In above change, the struct request size is not supposed to change in any configuration. Signed-off-by: Leonardo Bras <leobras@redhat.com> --- include/linux/blk-mq.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)